{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center/>MindInsight的模型溯源和数据溯源体验"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 概述\n",
    "在调参的场景下，需要多次调整模型超参并进行多次训练，在这个过程，往往需要手动记录每次训练使用参数以及训练结果。为此，MindSpore提供了自动记录模型参数，训练信息，以及训练结果评估指标的功能，并通过MindInsight进行可视化展示。本次体验会从MindInsight的数据记录，可视化效果，如何方便用户在模型调优，数据调优上做一次整体流程的体验。\n",
    "\n",
    "下面按照MindSpore的训练数据模型的正常步骤进行，当使用`SummaryCollector`进行数据保存操作时，会增加相应的说明，本次体验的整体流程如下：\n",
    "\n",
    "1. 数据集的准备，这里使用的是MNIST数据集。\n",
    "\n",
    "2. 构建一个网络，这里使用LeNet网络。\n",
    "\n",
    "3. 搭建及运行训练网络和测试网络。\n",
    "\n",
    "4. 启动MindInsight服务。\n",
    "\n",
    "5. 模型溯源的使用。调整模型参数多次训练并存储数据，并使用MindInsight的模型溯源功能对不同优化参数下训练产生的模型作对比，了解MindSpore中的各类优化对训练过程的影响及如何调优训练过程。\n",
    "\n",
    "6. 数据溯源的使用。调整数据参数多次训练并存储数据，并使用MindInsight的数据溯源功能对不同数据集下训练产生的模型进行对比分析，了解如何调优。\n",
    "\n",
    "\n",
    "本次体验将使用快速入门案例作为基础用例，将MindInsight的模型溯源和数据溯源的数据记录功能加入到案例中，快速入门案例的源码请参考：<https://gitee.com/mindspore/docs/blob/master/tutorials/tutorial_code/lenet/lenet.py>。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练的数据集下载"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### 准备\n",
    "#### 方法一：\n",
    "从以下网址下载，并将数据包解压缩后放至Jupyter的工作目录下：<br/>训练数据集：{\"<http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz>\", \"<http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz>\"}\n",
    "<br/>测试数据集：{\"<http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz>\", \"<http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz>\"}<br/>我们用下面代码查询jupyter的工作目录。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.getcwd()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "训练数据集放在----`Jupyter工作目录+\\MNIST_Data\\train\\`，此时`train`文件夹内应该包含两个文件，`train-images-idx3-ubyte`和`train-labels-idx1-ubyte` <br/>测试数据集放在----`Jupyter工作目录+\\MNIST_Data\\test\\`，此时`test`文件夹内应该包含两个文件，`t10k-images-idx3-ubyte`和`t10k-labels-idx1-ubyte`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 方法二：\n",
    "直接执行下面代码，会自动进行训练集的下载与解压，但是整个过程根据网络好坏情况会需要花费几分钟时间。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request   \n",
    "from urllib.parse import urlparse\n",
    "import gzip \n",
    "\n",
    "def unzip_file(gzip_path):\n",
    "    \"\"\"unzip dataset file\n",
    "    Args:\n",
    "        gzip_path (str): Dataset file path\n",
    "    \"\"\"\n",
    "    open_file = open(gzip_path.replace('.gz',''), 'wb')\n",
    "    gz_file = gzip.GzipFile(gzip_path)\n",
    "    open_file.write(gz_file.read())\n",
    "    gz_file.close()\n",
    "    \n",
    "def download_dataset():\n",
    "    \"\"\"Download the dataset from http://yann.lecun.com/exdb/mnist/.\"\"\"\n",
    "    print(\"******Downloading the MNIST dataset******\")\n",
    "    train_path = \"./MNIST_Data/train/\" \n",
    "    test_path = \"./MNIST_Data/test/\"\n",
    "    train_path_check = os.path.exists(train_path)\n",
    "    test_path_check = os.path.exists(test_path)\n",
    "    if not train_path_check and not test_path_check:\n",
    "        os.makedirs(train_path)\n",
    "        os.makedirs(test_path)\n",
    "    train_url = {\"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\", \"http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\"}\n",
    "    test_url = {\"http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\", \"http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\"}\n",
    "    \n",
    "    for url in train_url:\n",
    "        url_parse = urlparse(url)\n",
    "        # split the file name from url\n",
    "        file_name = os.path.join(train_path,url_parse.path.split('/')[-1])\n",
    "        if not os.path.exists(file_name.replace('.gz', '')):\n",
    "            file = urllib.request.urlretrieve(url, file_name)\n",
    "            unzip_file(file_name)\n",
    "            os.remove(file_name)\n",
    "            \n",
    "    for url in test_url:\n",
    "        url_parse = urlparse(url)\n",
    "        # split the file name from url\n",
    "        file_name = os.path.join(test_path,url_parse.path.split('/')[-1])\n",
    "        if not os.path.exists(file_name.replace('.gz', '')):\n",
    "            file = urllib.request.urlretrieve(url, file_name)\n",
    "            unzip_file(file_name)\n",
    "            os.remove(file_name)\n",
    "\n",
    "download_dataset()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这样就完成了数据集的下载解压缩工作。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据集处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据集处理对于训练非常重要，好的数据集可以有效提高训练精度和效率。在加载数据集前，我们通常会对数据集进行一些处理。\n",
    "<br/>我们定义一个函数`create_dataset`来创建数据集。在这个函数中，我们定义好需要进行的数据增强和处理操作：\n",
    "\n",
    "1. 定义数据集。\n",
    "2. 定义进行数据增强和处理所需要的一些参数。\n",
    "3. 根据参数，生成对应的数据增强操作。\n",
    "4. 使用`map`映射函数，将数据操作应用到数据集。\n",
    "5. 对生成的数据集进行处理。\n",
    "\n",
    "具体的数据集操作可以在MindInsight的数据溯源中进行可视化分析。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mindspore.dataset.transforms.vision.c_transforms as CV\n",
    "import mindspore.dataset.transforms.c_transforms as C\n",
    "from mindspore.dataset.transforms.vision import Inter\n",
    "from mindspore.common import dtype as mstype\n",
    "import mindspore.dataset as ds\n",
    "\n",
    "def create_dataset(data_path, batch_size=32, repeat_size=1,\n",
    "                   num_parallel_workers=1):\n",
    "    \"\"\" create dataset for train or test\n",
    "    Args:\n",
    "        data_path (str): Data path\n",
    "        batch_size (int): The number of data records in each group\n",
    "        repeat_size (int): The number of replicated data records\n",
    "        num_parallel_workers (int): The number of parallel workers\n",
    "    \"\"\"\n",
    "    # define dataset\n",
    "    mnist_ds = ds.MnistDataset(data_path)\n",
    "\n",
    "    # define some parameters needed for data enhancement and rough justification\n",
    "    resize_height, resize_width = 32, 32\n",
    "    rescale = 1.0 / 255.0\n",
    "    shift = 0.0\n",
    "    rescale_nml = 1 / 0.3081\n",
    "    shift_nml = -1 * 0.1307 / 0.3081\n",
    "\n",
    "    # according to the parameters, generate the corresponding data enhancement method\n",
    "    resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)\n",
    "    # if you need to use SummaryCollector to extract image data, do not use the following normalize operator operation\n",
    "    rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)\n",
    "    \n",
    "    rescale_op = CV.Rescale(rescale, shift)\n",
    "    hwc2chw_op = CV.HWC2CHW()\n",
    "    type_cast_op = C.TypeCast(mstype.int32)\n",
    "\n",
    "    # using map method to apply operations to a dataset\n",
    "    mnist_ds = mnist_ds.map(input_columns=\"label\", operations=type_cast_op, num_parallel_workers=num_parallel_workers)\n",
    "    mnist_ds = mnist_ds.map(input_columns=\"image\", operations=resize_op, num_parallel_workers=num_parallel_workers)\n",
    "    mnist_ds = mnist_ds.map(input_columns=\"image\", operations=rescale_op, num_parallel_workers=num_parallel_workers)\n",
    "    mnist_ds = mnist_ds.map(input_columns=\"image\", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers)\n",
    "    mnist_ds = mnist_ds.map(input_columns=\"image\", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers)\n",
    "    \n",
    "    # process the generated dataset\n",
    "    buffer_size = 10000\n",
    "    mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)  # 10000 as in LeNet train script\n",
    "    mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)\n",
    "    mnist_ds = mnist_ds.repeat(repeat_size)\n",
    "\n",
    "    return mnist_ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义LeNet5网络"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mindspore.ops import operations as P\n",
    "import mindspore.nn as nn\n",
    "from mindspore.common.initializer import TruncatedNormal\n",
    "\n",
    "def conv(in_channels, out_channels, kernel_size, stride=1, padding=0):\n",
    "    \"\"\"Conv layer weight initial.\"\"\"\n",
    "    weight = weight_variable()\n",
    "    return nn.Conv2d(in_channels, out_channels,\n",
    "                     kernel_size=kernel_size, stride=stride, padding=padding,\n",
    "                     weight_init=weight, has_bias=False, pad_mode=\"valid\")\n",
    "\n",
    "def fc_with_initialize(input_channels, out_channels):\n",
    "    \"\"\"Fc layer weight initial.\"\"\"\n",
    "    weight = weight_variable()\n",
    "    bias = weight_variable()\n",
    "    return nn.Dense(input_channels, out_channels, weight, bias)\n",
    "\n",
    "def weight_variable():\n",
    "    \"\"\"Weight initial.\"\"\"\n",
    "    return TruncatedNormal(0.02)\n",
    "\n",
    "class LeNet5(nn.Cell):\n",
    "    \"\"\"Lenet network structure.\"\"\"\n",
    "    def __init__(self):\n",
    "        super(LeNet5, self).__init__()\n",
    "        self.batch_size = 32 \n",
    "        self.conv1 = conv(1, 6, 5)\n",
    "        self.conv2 = conv(6, 16, 5)\n",
    "        self.fc1 = fc_with_initialize(16 * 5 * 5, 120)\n",
    "        self.fc2 = fc_with_initialize(120, 84)\n",
    "        self.fc3 = fc_with_initialize(84, 10)\n",
    "        self.relu = nn.ReLU()\n",
    "        self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)\n",
    "        self.flatten = nn.Flatten()\n",
    "\n",
    "    def construct(self, x):\n",
    "        x = self.conv1(x)\n",
    "        x = self.relu(x)\n",
    "        x = self.max_pool2d(x)\n",
    "        x = self.conv2(x)\n",
    "        x = self.relu(x)\n",
    "        x = self.max_pool2d(x)\n",
    "        x = self.flatten(x)\n",
    "        x = self.fc1(x)\n",
    "        x = self.relu(x)\n",
    "        x = self.fc2(x)\n",
    "        x = self.relu(x)\n",
    "        x = self.fc3(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 记录数据及启动训练"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "MindSpore 提供 `SummaryCollector` 进行记录训练过程中的信息。通过 `SummaryCollector` 的 `collect_specified_data` 参数，可以自定义记录指定数据。\n",
    "\n",
    "在本次体验中，我们将记录训练数据与数据集预处理的操作，我们将 `collect_specified_data` 中的 `collect_train_lineage`, `collect_eval_lineage`, `collect_dataset_graph` 设置成 `True`。SummaryCollector的更多用法，请参考[API文档](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.train.html?highlight=collector#mindspore.train.callback.SummaryCollector)。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mindspore.train.callback import SummaryCollector\n",
    "from mindspore.nn.metrics import Accuracy\n",
    "from mindspore import context\n",
    "from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits\n",
    "\n",
    "if __name__==\"__main__\":\n",
    "    context.set_context(mode=context.GRAPH_MODE, device_target = \"GPU\")\n",
    "    lr = 0.01\n",
    "    momentum = 0.9 \n",
    "    epoch_size = 10\n",
    "    mnist_path = \"./MNIST_Data\"\n",
    "    \n",
    "    net_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')\n",
    "    repeat_size = 1\n",
    "    # create the network\n",
    "    network = LeNet5()\n",
    "\n",
    "    # define the optimizer\n",
    "    net_opt = nn.Momentum(network.trainable_params(), lr, momentum)\n",
    "    config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)\n",
    "    ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_lenet\", config=config_ck)\n",
    "    model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n",
    "    \n",
    "    collect_specified_data = {\"collect_eval_lineage\": True, \"collect_train_lineage\": True， \"collect_dataset_graph\": True}\n",
    "    summary_collector = SummaryCollector(summary_dir=\"./summary_base/quick_start_summary01\", collect_specified_data=collect_specified_data, keep_default_action=False) \n",
    "\n",
    "    # Start to train\n",
    "    ds_train = create_dataset(os.path.join(mnist_path, \"train\"), 32, repeat_size)\n",
    "    model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, summary_collector], dataset_sink_mode=True)\n",
    "\n",
    "    print(\"============== Starting Testing ==============\")\n",
    "    # load the saved model for evaluation\n",
    "    param_dict = load_checkpoint(\"checkpoint_lenet-10_1875.ckpt\")\n",
    "    # load parameter to the network\n",
    "    load_param_into_net(network, param_dict)\n",
    "    # load testing dataset\n",
    "    ds_eval = create_dataset(os.path.join(mnist_path, \"test\"))\n",
    "    acc = model.eval(ds_eval, callbacks=[summary_collector], dataset_sink_mode=True)\n",
    "    print(\"============== Accuracy:{} ==============\".format(acc))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 启动及关闭MindInsight服务"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里主要展示如何启用及关闭MindInsight，更多的命令集信息，请参考MindSpore官方网站：<https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/visualization_tutorials.html>。\n",
    "\n",
    "启动MindInsight服务命令：\n",
    "\n",
    "mindinsight start --summary-base-dir=./summary_base --port=8080\n",
    "\n",
    "\n",
    "- `--summary-base-dir`：MindInsight指定启动工作路径的命令；`./summary_base` 为 `SummaryCollector` 的 `summary_dir` 参数所指定的目录。\n",
    "- `--port`：MindInsight指定启动的端口，数值可以任意为1~65535的范围内。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "停止MindInsight服务命令：`mindinsight stop --port=8080`\n",
    "\n",
    "- `mindinsight stop`：MindInsight关闭服务命令。\n",
    "- `--port=8080`：即MindInsight服务开启在`8080`端口，所以这里写成`--port=8080`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型溯源\n",
    "\n",
    "### 连接到模型溯源地址\n",
    "\n",
    "浏览器中输入:`http://127.0.0.1:8080`，点击模型溯源，如下模型溯源界面："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![image](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/mindinsight/images/model_lineage_all.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如图所示，在页面的左上角中，我们可以选择要查看的训练信息，并通过这个功能挑选出我们要关注的训练信息，而不是要查看所有的训练信息，避免影响对不同训练作业进行对比分析。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 观察分析记录下来的溯源参数\n",
    "\n",
    "下图选择了数条不同参数下训练生成的模型进行对比："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![image](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/mindinsight/images/model_lineage_cp.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这几次训练的参数中，优化器，epoch和学习率都不一致，可以看到不同的训练生成的模型准确率`Accuracy`和损失值是不一致的，当然最好是调整单个参数来观察对模型生成的影响，避免多重因素干扰，难以分辨哪个参数是正影响，哪个参数是负影响。\n",
    "\n",
    "在多次训练时，需要为 `SummaryCollector` 的 `summary_dir` 参数的指定不同的文件夹，否则训练记录的数据会生成在同一个文件夹下，会导致MindInsight展示的数据为非预期。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据溯源\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "点击模型溯源，如下图数据溯源界面："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![image](https://gitee.com/mindspore/docs/raw/master/tutorials/notebook/mindinsight/images/data_lineage.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据溯源的是记录了每次训练中对数据集进行操作的流程，通过分析数据溯源，查看数据预处理过程中是否有遗漏的步骤或者不合理的操作。\n",
    "\n",
    "如图所示，图中几次训练的数据增强过程由原数据集MnistDataset开始，按照先后顺序经过了下面的操作：\n",
    "\n",
    "- label的数据类型转换（`Map_Typecast`）\n",
    "- 图像的高宽缩放（`Map_Resize`）\n",
    "- 图像的比例缩放（`Map_Rescale`）\n",
    "- 图像数据的张量变换（`Map_HWC2CHW`）\n",
    "- 图像混洗（`Shuffle`）\n",
    "- 图像成组（`Batch`）\n",
    "- 图像数量增强（`Repeat`）\n",
    "\n",
    "在数据溯源中，可以对不同训练所使用的数据预处理操作进行对比，快速发现数据预处理中存在的问题。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以上就是本次对MindInsight的模型溯源和数据溯源的体验全过程。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}